Reading a string line by line

by: Anilt, 9 years ago


Hi,
I am trying to read string which have some data as below:
good
Good
good
better
good
excellent
good
Can you please let me know how to read it one by one and count the max repeated word? I tried utilizing StringIO and string for reading it line by line and counter for counting, but failing to do so.

Regards
Anil



You must be logged in to post. Please login or register an account.



You can use the counter module to count words. To read line by line, just take your string, and do .split() using a backslash n as what you split by.

-Harrison 9 years ago
Last edited 9 years ago

You must be logged in to post. Please login or register an account.


Hi Harrison,

It is reading only the first line in the string. Can u please help me on this. The code which I have written is as below..

import nltk
#import re
#from nltk.draw.tree import draw_trees
import string
import io
import StringIO
import counter
from nltk.corpus import state_union
from nltk.tokenize import word_tokenize
from nltk.tokenize import PunktSentenceTokenizer
from nltk.tag import pos_tag
text=state_union.raw("/home/hduser/mail_file")
cust_tonizer=PunktSentenceTokenizer(text)
tokenized=cust_tonizer.tokenize(text)
words=word_tokenize(text)
tags=pos_tag(words)
for t in tags:
    if t[1] == "JJ" or t[1] == "JJR":
       data = t[0]
       #matches=value.append(t[0])
       #fdist = nltk.FreqDist(value)
       #most_common = fdist.max()
       #top_three = fdist.keys()[:3]
       print data
       #print top_three
    else:
         continue

val=data.split('n')
counter={}
for line in val:
    print line
    counter[line] = counter.get(line, 0) + 1
    cnt=sorted([ (freq,word) for word, freq in counter.items() ], reverse=True)[:3]
    print cnt


-Anilt 9 years ago
Last edited 9 years ago

You must be logged in to post. Please login or register an account.


Hello, my name is Mike and I am new to this forum. I have been learning Python for a few months but I think I could answer the question in this thread.

This is how I would do the task:

fhand = open('results.txt')     # I assume the results (good, better, etc.) are in a text file
marks = dict()                        # Empty dictionary, will include the frequency of all results

for line in fhand:                    # Going through the marks in the text file (all words are lowercased)
    if len(line) > 1:
        line = line.lower().strip()
        marks[line] = marks.get(line, 0) + 1

max_value = 0                     # This pair of variables will store the most frequent word
max_key = ''                         # '' is an empty string

for key, value in marks.items():
    if value > max_value:
        max_value = value
        max_key = key

print(max_key, max_value)            #Hope this is what you are looking for.
fhand.close()


Please let me know if the code doesn't work properly.

Here is the link to the code (PNG file): https://goo.gl/YDr8M6

Mike

-mnalevanko 9 years ago
Last edited 9 years ago

You must be logged in to post. Please login or register an account.


Thanks for sharing Mike! I went ahead and wrote up a way for people to use tags: [code] [/code] to make code much easier to read here.

-Harrison 9 years ago
Last edited 9 years ago

You must be logged in to post. Please login or register an account.